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ABSTRACT 

Considered within this paper is the problem of minimi- 
zation of a function of unconstrained variables. A wide 
variety of solutions to this problem is presented and the 
possible advantages of each method are discussed. For the 
purpose of this paper these techniques are divided into 
four broad categories: general search directions; conjugate 

search directions; Cauchy's Steepest Descent and Newton's 
method; and variable metric methods. 
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I. INTRODUCTION 



The problem under consideration in this paper is that 
of minimizing £, a function of n unconstrained variables. 
This is a problem that arises frequently in many widely 
varied fields. In general, it may be more likely to find 
that the variables of the function have certain constraints 
placed upon them. While this problem is conceptually more 
involved it is felt that the key to its solution lies in 
the solution of the unconstrained problem. Thus a great 
deal of attention has been given to this latter problem. 

It is felt that if the unconstrained problem can be solved 
then its method of solution can be applied to the con- 
strained problem, by the technique of adjoining penalty 
functions corresponding to the constraints. 

The minimization of a function is certainly not a new 
problem to mathematics. Famous scientists such as Cauchy 
and Newton long ago devised methods of solution. In fact, 
their methods remain useful today and will be two of those 
discussed in this paper. However, the functions to be 
minimized have become more and more complex. The classical 
methods of Cauchy and Newton are often found to be inade- 
quate . 

Thus since the late 1950 's there has been a great deal 
of research in this field. The key to this surge has been 
the computer. Difficult and complex methods of solution 
would be useless without the computer; but its availability 
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has allowed the development of many new ingenious minimiza- 
tion techniques which would have previously been impossible 
to implement. 

With this new research, one fact has become apparent. 

No one method seems to be the most efficient for every 
type of function. Most methods have certain characteristics 
which make them more useful in some cases than others. 
Therefore, apparently there is no simple solution to the 
minimization problem. 

Thus, the purpose of this paper is to present a wide 
selection of these new and old methods of solution. Their 
relative advantages and disadvantages will be discussed in 
the hope that the most efficient method can be selected for 
the function to be minimized. Unfortunately the complexity 
of this field and the lack of sufficient time has not per- 
mitted the author to program many of the methods to be 
discussed. Since this is important to the rating of the 
relative efficiencies of these methods, the findings of 
other researchers will be used and referenced to allow fur- 
ther in depth study of specific problems. 
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II. GENERAL SEARCH DIRECTIONS 



A. GRID METHOD [9] 

As we go from the first to the fourth category we go 
to more and more sophisticated methods. The first method 
is quite simple, but its applicability is very limited. 

Its* advantage is the ease with which it can be programmed. 

This method, to be useful, requires that there be 
some knowledge of the location of the minimum. Therefore, 
assume that the minimum, X = (Xj , . . . ,x ) , is known to lie 
within a certain rectangular region defined as follows: 
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where r^ is a positive integer. 

The function is evaluated at each point of this grid 
and the smallest value is taken as the minimum of the func- 
tion. The difficulties with this method are obvious. To 
obtain a good approximation to the minimum it would be 
necessary to make each s^ "small." But decreasing the 
size of s- increases the number of points at which the 
function must be evaluated. The number of evaluations 
needed is M = (r : +l) (r 2 +l) . . . (r +1) . If n is large then 
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M can be so great that the method would require far too 
many evaluations for it to be useful. 

B. ALTERNATING VARIABLE METHOD 

In this method a basic technique is introduced that 
will be employed in most of the methods that follow. A 
point and a direction are chosen in some way, and the mini 
mum value is sought along the resulting line. A direction 
is a vector in n-space which is usually represented by a 
column matrix. At this time it will, be assumed that, 
given a straight line, the point on that line at which the 
function has a minimum value can be determined. Since 
this is a secondary problem and its solution is rather 
elementary it will be dealt with later, in Appendix A. 
Included there are some techniques for treating the basic 
problem. 

This minimization process is characterized by the use 
of permanently fixed search directions. Generally, the 
directions are chosen parallel to the various co-ordinate 
axes. Each variable in turn is changed or perturbed and 
a search carried out so as to minimize the function on 
that line. The effect, as shown below, is that of a 
staircase, in which the steps decrease in size near the 
minimum, if the function is a quadratic. 

In n dimensions, a function whose level surfaces were 
hyperspheres would be minimized in n searches. But if a 
function is not of this nice nature then certain difficult 
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problems can arise. For practical use there must be some 
means of determining when the process should be halted. 
There are many such rules, called convergence criteria, 
which can be used. One of these criteria is based upon 
the change in the value of the function over one itera- 
tion [9]. It has been suggested for some methods that if 
this change is less than some e then the process should be 
terminated. But one of the problems that may be encoun- 
tered in the Alternating Variable Method is that if the 
principle axis of the function is not aligned, at least 
approximately with one of the co-ordinate axes, progress 
along each search direction may be very small. In this 
case the choice of the convergence criterion discussed 
above may lead to a halt of the process well before the 
actual minimum is reached. Since many functions arising 
in problems are not of the "nice" hyperspherical nature, 
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the Alternating Variable Search Method is often too sim- 
ple for good results [9]. 

C. METHOD OF HOOKE AND JEEVES 

The method invented by Hooke and Jeeves attempts to 
improve the inflexible search routine discussed above. 

To do this the concepts of exploratory moves and pattern 
moves are introduced [3,9]. 

The exploratory process resembles the alternating 
variable search technique in that it uses the co-ordinate 
directions to search along. However, it is not assumed 
that the minimum along each line can be found. Instead 
x^ is perturbed by an amount d^ while the other variables 
are held fixed. If the functional value is decreased 
with this step then the new point replaces the previous 
one and the next variable is considered. If the function 
is not decreased then the original x^ is perturbed by -d^, 
and again the functional values are compared. This new 
point may or may not replace the old one, but in either 
case the next variable is then considered. One cycle is 
complete when all the variables in turn have been per- 
turbed . 

The next step taken is what Hooke and Jeeves call a 
"pattern” move, which is made from the last point arrived 
at during the exploratory phase. Let us call this last 
point a n and let ao be the point at \tfhich the cycle started. 
The pattern move will then be to the point 2a n -a 0 The purpose 
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of this is to make another move in the general direction 
of total progress made during the previous cycle. From 
this point a new round of exploratory moves is performed 
and the functional value at the last point is compared 
with the value of the function at a n> The entire process 
is then repeated from the point which had the smaller 
functional value. 

This is continued until no progress is made during a 
cycle of exploratory moves, which may indicate that the 
present point is within d^ of the minimum or it may be 
that the minimum point lies in a steep skew valley. For 
further progress then, d^ must be reduced before the pro- 
cess is continued. When d^ becomes less then some speci- 
fied e it is assumed that the operation has converged. 

A slow rate of convergence is often a very real prob- 
lem with this method. The choice of d. is critical.- If 

1 

the initial point is far from the minimum and d^ is rela- 
tively small then the process would be very time consuming 
with a great number of functional evaluations needed. 

Even with its disadvantages, this method is the first 
example of a principle which will be applied over and over 
later in this paper. The method attempts to use past in- 
formation that has been obtained about the function. Thus 
the process calls for a move in the direction of progress 
during the exploratory moves which were made to indicate 
the general local nature of the function. The incorpora- 
tion of any previous knowledge obtained about the function 
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is an important characteristic of most of the more involved 
minimization methods. 

D. SIMPLEX METHOD: NELDEK AND MEADE (1965) 

In this method a simplex is defined: this is ini- 

tially a configuration of n-KL equally spaced points in n 
space; for example an equilateral triangle in two space. 

The method presented by Nelder and Meade was designed 
to eliminate some of the problems that arose in early sim- 
plex methods [3,9]. Unlike some of these other methods 
this one does not require that the simplex remain equilat- 
eral. With this greater flexibility in shape it may pos- 
sibly be easier for the method to follow the contours of 
the function and thus not be obstructed by something such 
as a steep skew valley. 

The first step is to evaluate the function at each of 
the vertices of the simplex. The vertex at which the func- 
tion is maximum, V-^, is then reflected through the cen- 
troid, C, of the other vertices. 

V„ oir = (1+a) C - aV, 
new 1 

or 

- C | 
new 1 

C - I • 

If a=l this method is simply one of the earlier methods in 

which the simplex remains equilateral. 

After the function is evaluated at V this value 

new 

is compared with the values at the other vertices. There 
are four different cases which must be considered. 
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i. First assume that £(V ) is less than the pre- 

new r 

vious second largest value o£ the function at a vertex but 
larger than the value at some other vertex. Since the 
point at which the function was largest was the point that 
was reflected this case implies that the second largest 
value of the function has become the largest. Thus V 

new 

replaces V-^ and the process continues. 

ii. Second, assume f(V ) is less than the value of 
the function at all the vertices. This indicates that a 
direction has been found along which the function can be 
greatly reduced. Thus it might be wise to investigate 
this direction, further, which would not be the case if 
condition i. held. To take a step further in this direc- 
tion therefore calls for the use of an expansion coeffi- 
cient y > 1. 

Define : 

V = yV + (l-y)C . 

The result of this process is to define a new point V e 

which is on the same line with V but is farther from 

new 

the centroid. If the value of the function at is less 

e 

than f(V ) then V replaces V, . Otherwise the point 
new' e r 1 r 

V , r replaces V. . In either case the process is then con- 
new r 1 r 

tinued . 

iii. Third, assume f (V )<f(V, ) but f(V ) is still 
greater than the values of the function at the other ver- 
tices. This could indicate that there is a relative 
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minimum somewhere between V and V. . Thus it may be 
desirable not to go quite as far as called for by the re- 
flection. This can be accomplished through the use of a 
contraction coefficient 6 < 1. 

Let us take: 

v c = BV new + (l-B)C, where 0 < 6 < 1. 

Again the function is evaluated at this new point and this 
value is compared with the values of the function at the 
other vertices. If the result is still a maximum then the 
contraction is considered a failure and a different strat- 
egy must be used. Otherwise V c replaces and the process 
continues . 

A failure in the contraction could indicate that the 
simplex has entered a steep skew valley or that a minimum 
is being approached. In either case the size of the sim- 
plex must be reduced so that further progress may be made. 
The natural way of doing this is to cut by one half the 
distance of each vertex from the vertex at which the func- 
tion was a minimum. Thus the simplex is shifted toward 
what should be a more favorable area. After doing this 
the reflection process is again continued. 

iv. Fourth, assume f (V new ) (V^) . In this case the 
reflected direction does not seem very favorable so it is 
rejected. Again a reduction in the size of the simplex is 
called for before the procedure is continued. 

The following two criteria could be used to halt the 
process : 
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i . Let 



s = 



n+1 



i=l 



(q - F) 

n 



be the standard deviation. Then if s is less than some 
specified number, convergence might be assumed. In a 
steep skew valley this criterion could cause a premature 
halt and thus if this problem is feared the following 
criterion should be employed. 

ii. Calculate s after each k function evaluations. 
Convergence would be assumed if successive s's were less 
than some specified number and the difference between two 
successive f's was less than some small number. 

This simplex method is best suited for problems in 
which the number of variables is small. The process is 
relatively slow and with a large number of variables the 
required computer time could become intolerably large. 

One technique that should probably be checked is whether 
it might not be better to reflect through the centroid 
of some of the vertices with smaller function values 
rather than the centroid of all the vertices. Also in 
the expansion phase of the process it would probably be 
better to continue expanding until a failure is achieved. 
Since this direction is favorable why should only one ex- 
pansion be attempted? Since this expansion would result 
in a new point that may be quite a distance from the re- 
mainder of the simplex it would then be wise to reduce the 
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distances of the other vertices from this point. But then 
this has the effect of shifting the simplex toward what 
should be a more favorable area. 

E . ROSENBROCK ' S METHOD 

This method, devised by Rosenbrock, is a rather ob- 
vious development from the method of Hooke and Jeeves 
discussed earlier [3, 14]. The process is usually started 
by using the co-ordinate directions as the first search 
directions but, in general, any set of n mutually ortho- 
normal direction vectors could be used. As with the method 
of Hooke and Jeeves, each direction is considered individual- 
ly with a step of length d^ taken along it. If the value 
of the function at this new point is less than or equal to 
the value at the original point then the step is termed a 
success. Otherwise it is considered a failure. If a suc- 
cess had resulted then d^ is multiplied by some a > 1. If 
the result was a failure then d^ is multiplied by -$, 

0 < 3 < 1. In either case the next search direction is 
then investigated. This procedure is continued until a 
success and a failure have been obtained in each direction. 
This constitutes the end of one stage. 

After each stage is completed new search directions 
must be defined. This is done through the use of the 
following vectors: 

n 

^ t ~ 1 , • • • , n , 
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where r a unit vector, is the k'th search direction in 
the j'th stage; is the sum of all the steps taken in the 
direction of From this definition it is readily ap- 

parent that a x is the total, progress made during that stage 
a 2 is the total progress made in all the directions other 
than the first, etc. An important property of these vec- 
tors is that they are linearly independent. This property 
results from the choice of the definition for a success. 

The linear independence property of the a^'s would be lost 
if at any time there was no progress made along one of the 
search directions during a stage. At first this seems to 
be a possibility since it could happen that a stage is 
started at a point that minimizes the function along one 
of the search directions. But allowing equality in the 
definition of a success eliminates this problem. During 
the process the step size along such a direction would be 
reduced to such an extent that, for computer use, the value 
of the function at these two points would be the same and 
thus a success is obtained. While this step may get very, 
very small it will still be different from zero and thus 
some progress is always made in every direction. 

The new directions are defined as follows by the Gram 

Schmidt orthogonali zation process. 

k- 1 

s k = a k X! ( a k 4 ) 4 
1=1 

and 



K 



j + l 

k 
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Thus these new vectors form a set of n mutually orthonormal 
search directions. 

There are various criteria that could be used to stop 
this process. A limit could be set upon the number of 
function evaluations to be made during the process. This 
obviously may halt the method well before the minimum is 
reached, but it is helpful in avoiding the use of too much 
computer time. The process could also be terminated if 
| | a j | | is smaller than some given number [3]. This would 
mean that the progress made during one stage was very small. 
This seems to be a quite natural stopping criterion, for 
surely a s the minimum is approached the progress made will 
get less and less. Unfortunately this could also be the 
characteristic for a steep skew valley. If there is a pos- 
sibility that the function has this property then great care 
must be taken to avoid a premature halt in the process. A 
third criterion for convergence could be | a 2 j / | a. 2 | > .3. 

This should be used only if the d^'s are scaled to have 
similar magnitudes [3]. The reasoning behind this crite- 
rion is that |a 2 |/|a x | > .3 indicates that the direction 
of total progress is rapidly changing which is again a char- 
acteristic trait of the function in the vicinity of the min- 
imum. This rapid change could also be present early in the 
process so this convergence criterion should be applied 
only after a number of stages have been completed. 

There is a close relationship between the pattern move 
devised by Hooke and Jeeves and the £•[ defined above. They 
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are both in the direction of total progress made during 
one stage. Rosenbrock's method is far superior, though, 
because of its complete use of the knowledge gained about 
the function as is evident in the generation of the new 
search directions [3,9]. This has the property of align- 
ing the search directions with the principle axis of the 
function. 

The ease with which this method can be adapted to 
computer use and its relative stability has shown it to be 
one of the most useful of the direct search procedures. 

F. DAVIES, SWANN AND CAMPEY (1964) 

This method is a further refinement of the useful 
method invented by Rosenbrock [3,9], It attempts to re- 
move the restriction of a fixed step length. As with 
Rosenbrock's method, the search directions are n mutually 
orthonormal vectors chosen initially as parallel to the 
co-ordinate axes. In this case, though, it will be as- 
sumed that, within a certain degree of accuracy, the mini- 
mum along each search direction can be found. This 
assumption introduces one difficulty that Rosenbrock's 
method does not have. 

It may not be possible to make progress along a cer- 
tain search direction and thus the vectors, a., defined in 

l 

Rosenbrock's method, would not be linearly independent. 
Assume that there can be formed (n-m) linearly independent 
vectors. These vectors are orthogonalized as described 
previously and become (n-m) search directions. The other 



18 



m directions needed are those along which no progress was 
made during the previous cycle. Since none of the new 
vectors have components in the direction of these m vec- 
tors and the m vectors were already orthonormal it is 
evident that again the process is begun with n mutually 
orthonormal vectors. Convergence criteria similar to those 
suggested with Rosenbrock' s method can be applied to this 
method . 

The assumption that was made in regard to minimization 
along a line introduces an added factor that must be con- 
sidered when choosing between this method and the one de- 
vised by Rosenbrock. Depending upon the technique used 
.to obtain this minimum a larger number of function evalua- 
tions might be needed. Thus, if this latest method did 
not significantly increase the rate of convergence the 
additional evaluations required might indicate that 
Rosenbrock' s method is better in that case. Also when far 
from the minimum there is no real advantage to obtaining 
the exact minimum along any specific direction. Thus again 
the fixed step length may have an advantage because of the 
less time required. In general, though, this method has 
been found superior to the method of Rosenbrock [3,9]. 



G. MATRIX ESTIMATOR 



In this section it will be assumed implicitly that the 
function may be approximated by a quadratic, at least in 
some region. A method is developed for determining the 
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coefficients and using these to estimate the point where f 
effects its minimum [6] . 

Let us now consider the case in which the function f 
is a quadratic in the form: 

f ( x) = hx ' Ax +- b ' x + c 

where x' = (Xp.-.jX ). 

Under these conditions if the matrix A is knoxsrn then 
it may be shown that the minimization problem is easy to 
solve, as follows. Let x be the point where the minimum 
occurs and consider: 

Vf(x) = b + Ax 

and 

Vf(x) = 0 = b + Ax. 

If these are subtracted we get: 

Vf(x) - Vf(x) = Ax - Ax 

whence 

x = x -- A" 1 Vf (x) . (1-1) 

Thus (1-1) can be applied to find x if A and Vf(x) can be 

determined. The matrix A can be found as follows: 

lc 1 1c 

Consider a sequence of points x = x + p^, where 
k k 

Pk = X dk and A is selected to minimize f along the line 
defined by x ^ and d^ . Then, 
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f(x k + 1 ) = c + b’(x k +p k ) + Jg(x k + P k ) ’A(x k +P k ) 

= f(x k ) .+ Cb’p k +p^.Ax k ) + ^p£Ap k 
= £(x k ) + V ' £ k p k + P k Ap k /2 . (1-2) 



f(x k ) 



but V'f k+ ^p k 
Hence , 



£(x k+1 ) 



c + b* (x k+1 -p k ) + Jg(x k ' 1 -p x )’A(x k+1 rp k ) 
£(x k+1 ^ - V'£ k+1 p k + ^p^.Ap k /2 (1-3) 

0 because £ was minimized along d k . 

f(x k ) - P] (Ap k /2. (1-4) 



This last relation is very useful in determining the ele- 



ments of the matrix A, as follows. 

Let us define e to be a column vector that is zero 
except for a one in the k t b position and choose dj__^ = e k . 
For this choice of d k equation (1-4) reduces to: 

(1-5) 

k = 0 , 1 , . . . ,n- 1 . 



- 2 (f (x k+1 ) - f(x k )) = 



U ) 



k^ 2 k+l,k+l 



Where a^, i = l,.-..,n are the diagonal elements of A. The 
above equation thus generates the diagonal elements of A 
by the use of function values only. 

To obtain the off-diagonal elements let us define: 



d^j = e 1 + for i = l,...,n-l; j = i+l,...,n. 

Then let A 1 - 1 be the scalar that minimizes f along d^ . . 
This reduces equation (1-4) to the form: 
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( 1 - 6 ) 



(-f(x ktl ) * f(x k ) - (*«)«« - U ij ) ! a ii 

2(X i;i ) 2 



= a . 
ij 



i = 1, . . . ,n-l 



and 



j = i+1, ,n. 

Equations (1-5) and (1-6) thus completely define the 
matrix A. This procedure is accomplished by minimization 
along n(n+l)/2 directions. If It happens that one of the 
X's is zero then it is assigned the value of a very small 
but nonzero constant and equations (1-5) and (1-6) are 
used to generate approximations to a^ and a^j . After A 
has been found the problem remaining is the determination 
of V f (x) . 

For the choice of d^ = e : 



9f 

9x 



= |£CiIl ♦ f a ..(x n - x 1 ) 

i l j - 1 J J J 



if f is quadratic. 

But 9f(x^)/9x^ = 0 ; x*} = x ^ for j = 1 
for j = i+l,...,n by our choice of d^, 
tions reduces (1-7) to the following: 



j 1 u 

. , l : and x . = x . 

’ 3 3 



The above condi- 



(1-7) 



0 



9 f 
9x 



1^1= t a-.(x n - x°). 

i j=i + i ^ j y 



(1-8) 



Complete knowledge of A thus enables the calculation of 
the gradient at any point in the above sequence. A de- 
velopment very similar to that used to derive (1-1) pro- 



duces the following results: 
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(1-9) 



Vf(y) = Vf(x n ) + A(y-x n ) 

x = y -A _1 V£(y) (1-10) 

In general, of course, most functions of interest are 
not quadratic. But since in the neighborhood of the mini- 
mum of the function we will assume generally they closely 
approximate a quadratic, it may be possible to adapt the 
above procedure to an iterative process. Each n(n+l)/2 
searches would produce a new approximation to A. By using 
this approximation in equations (1-9) and (1-10) it may be 
possible to approach the minimum. Unfortunately there is 
no guarantee that far from the minimum this method would 
produce a good or even useful approximation to the matrix 
A. Therefore if only function values are to be used it 
would probably be best to employ one of the other methods, 
such as Rosenbrock ' s , until it is felt that the process has 
reached a point that is reasonably close to the minimum. 
Switching over to this latter method at this time could 
possibly be very valuable because of its exact nature for 
quadratic functions. Of course, in the use of this method 
it must be realized that more storage space will be required 
of the computer, since the matrix A must be stored. And 
since each cycle of this method requires n(n+l)/2 searches 
rather than the previously used n directions, significant 
progress must be made at each stage to warrant its use. 



23 



III. CONJUGATE SEARCH DIRECTIONS 



In this chapter we will consider some methods based 
on the idea of conjugate directions. 

Let us again consider a quadratic function of the form: 



Two directions 
to A if 



f(x) = c 

d . and d . 
i 1 



+ b 'x + x'Ax/2 . 

are called conjugate with respect 



d . Ad'. = 0 , for i f j . 

Conjugate directions play a significant role in recent 
developments in minimization theory, as shall be seen in 
the following discussion. Assume that dj,...,d are n 

mutually conjugate directions. Let x = x^ + A^d^ 

i o n i 

where the A 's are selected to minimize ffx + . E. Ad.)* 

v i=l \ J 

We shall see that the resulting point furnishes the desired 
minimum to f. 

Consider : 

n * n 

f (x) = + A 1 d^) ’A(x° + ^g^ A 1 d^) + b’(x° + ^g^ A 1 d^)+C 

= f(x°) + i g 1 (3 s A?d!Ad i + A i d! (Ax°+b)) . 

Therefore to select the x 1 ' s to minimize f(x + A d^) is the 
same problem as selecting each A 1 to minimize (^(A 1 ) 2 d|Ad^ 

+ A 1 d|(Ax° + b)). Therefore the choice of each A 1 is inde- 
pendent of every other A- 1 , j f i. 
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This implies that if n conjugate directions are used 
it is sufficient to minimize once along each direction to 
obtain the minimum value of the function. Since any ar- 
bitrarily chosen n linearly independent vectors usually do 
not have this property the advantage in using conjugate 
directions is clearly evident. Let us now take up methods 
which in one way or another make use of searches in con- 
jugate directions. 

A. POWELL’S METHOD 

This method depends upon the following manner of gen- 
erating conjugate directions. Assume that the function is 
a positive definite quadratic. Let us pick a direction d,, 
and two points x° and x 2 such that x°-x 2 is not a multiple 
of dj. Let us define: 

x 1 = x° + adj 

x 3 = x 2 + gdj 

where a and B are chosen such that f(x°+ad 1 ) and f(x 1 +Bd J ) 
are the minimum values on their respective lines. 

Then 

dJVf (x 3 ) = dj (Ax 3 +b) = 0 

and 

dJVf(x') = dJ(Ax'+b) = 0 

so that 

d{A(x 3 -x ' ) = 0. (2-1) 
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Equation (2-1) shows that the direction (x 3 -x 1 ) is conju- 
gate to the original search direction. It is this rea- 
soning that Powell used to develop his method [13] . 

As with most methods involving search directions this 
procedure is begun with a choice of n linearly independent 

search directions dj,...^ 1 . The function is then minimized 

i n 

along each direction in succession, with x n being the point 
resulting from the minimization along d^, the last search 
direction. From this the direction (x n -x°) is obtained 
where x° is the initial point. This direction is used as 
another search direction along which to minimize. The re- 
sult of this is a new initial point from which to begin 
another round of searches along n linearly independent di- 



rections 



The 



1 ^ ^ ■ 
-L. 0-0 



_ 1 A -» -v* / 



2.T0 Tsticiincci but 3 d v 3 nc c d 



in index by one as follows: 



d? = d [ , j = 1 ..... ,n- 1 
3 3 + 1 

d o n. 0 

n 

There is a possibility that no progress might be made 
along a certain search direction during any given cycle. 

For example, assume that a cycle is begun at a point which 
minimizes the function along dj. Therefore no further pro- 
gress is made along this direction which means that (x n -x°) 
will have no component along d*. But then d^ is deleted 
from the next round of searches which implies that the n 
search directions will not span the given space. If this 
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problem does not arise then the method will generate n con- 
jugate directions after n stages. The next search stage 
would therefore produce the desired minimum regardless of 
where the initial point was located. Of course, all this 
was done under the assumption that the function was a 
positive definite quadratic. Since, in general, this is 
not in fact the case then the process must be applied 
iteratively . 

The problem of loss of linear independence is a 
serious one, though, and cannot be overlooked. If this 
problem arose the method could not converge no matter how 
long the computer worked on it. Since conjugate directions 
seem to offer significant advantages in the minimization 
problem a refinement of the above method was sought to 
eliminate its flaws. Just such a method was suggested by 
Powell in 1964. 

B. REVISED CONJUGATE DIRECTIONS BY POWELL [12] 

Again it is assumed that the method is begun with n 

1 i 

search directions d x , . . . ,d ; these and each subsequent set 
are to be linearly independent and scaled such that: 

d^'Ad^ = 1 for j = 1 , . . . ,n. 

Let det D = det (d| ' ‘ * d 1 ) . It will now be shown that this 

^ i n-^ 




gate. Let v 1 , i = l,...,n, be a set of n conjugate, nonzero 
scaled vectors. Since they are conjugate and nonzero they 
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must be linearly independent. This implies that each dj 
can be written as a linear combination as follows: 



i n v 

d. = ,E, u.,v K 
j k=l jk 



or 



Cd ...d n ) u ik vk **-k=l u nk vk) 



Therefore 



det D = | C v 1 . . . v 11 ) | | U | 

i i n m n n 

d . 'Ad} = ( Z 1 u. v m )'A( E, u, v P ) 
3 k v m=l jm 1 '■p-l kp J 



( 2 - 2 ) 



n n ni ' n 

= E, E, u. u, v m Av p 
m=l p=l 3 m kp 



n 



since 



Therefore 



It' A P 
v Av^ 



d 1 , Ad^ 
J 1 



= E, u. u v v m Av m 
m=l jm km 



= 0 for m f p . 



1 k -1 u jk u jk * 



(2-3) 



But equation (2-3) shows that the determinant of U can not 
exceed one, which it equals only if U is an orthogonal ma- 
trix. If this is the case: 



n 



d^'Adi = E, 
3 k m=l 



Equation (2-4) implies 

d^, ...jd 1 are mutually 
1 n 

were chosen arbitrarily 



u . u, =0 j f k. (2-4) 

j m km J v J 

therefore that the directions 
conjugate. Therefore since the v 1 ^ 
it can be seen from ( 2 - 2 ) that det 
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D is maximized when the determinant of U is maximized, which 
implied that the given directions were conjugate. This re- 
sult then forms the basis for this new method devised by 
Powell . 

For the most part this new method resembles the first 
method by Powell: after the n search directions are used, 

it is desired to look at the direction x n -x° . Again this 
direction is examined since it appears to be along this 
direction that the process is progressing toward the mini- 
mum. Unlike the earlier method, though, this new direction 
is not automatically accepted as a new search vector. It 
must first be determined whether replacing one of the vec- 
tors in D by this new vector would increase det D. If it 
is increased then it can be reasoned that the directions 
must be approaching conjugacy. Obviously the det D would 
not increase if the replacement made det D = 0. This would 
be the case if the new direction was not linearly indepen- 
dent. Thus by using the new direction only when it in- 
creases det D insures that the linear dependence problem of 
the earlier method is eliminated. The question then arises 
as to which of the old directions should be deleted when 
the new direction is added. 

If we assume that the vectors are scaled such that 
d^'Ad^ = 1 then 



fCx 1 ' 1 ) = 3 s (x i -X i d^) , A(x i -X i d^) + b ' (x 1 - X 1 d^) + C 
f (x^) = Js(x^) ’Ax^ + b'x^ + C 
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f(x 1_1 ) - fCx 1 ) = ^(A 1 ) 2 d k, Ad k - (A 1 d k, Ax*' + A'M k 'b) 

= ^(A 1 ) 2 - V' £(x i )d^ = (A i ) 2 . 

Therefore 



A 1 = ^CfCx 1 ' 1 )-^* 1 )) • 



( 2 - 5 ) 



Now (x n -x°) = A d k +...+A n d k = yd k , where y is chosen so that 

d k -Ad k = 1. n " 

P P 

k k 

If d replaces d. in det D the following results: 

p ^ l & 



,k ,k jk,k ,k 

d , . . . d . , d d. , . . . d 

i l-l p l+l n 



d k ...d. , (— d k +. . .+^d k V . .d k 

i l-IVy i y n J n 



,k ,k A 1 ,k ,k ,k 

d , . . . d . -1 — d . d ....... d 

i l-l y l i+l n 




(det .D) . 



k k 

From this it can be seen that replacing d^ by dp has the ef- 
fect of multiplying the determinant by |A 1 /y|. This multi- 
Pl ication factor is greatest when the largest A 1 is chosen. 
But, since A 1 represents the change in the value of the 
function when minimizing along d^, the largest A corresponds 
to the direction along which the function underwent the 
greatest reduction in value. Thus the new direction should 
replace whichever direction produced the greatest reduction 
in the value of the function as long as | A 1 /y | > 1 . If this 
last inequality is not satisfied for some i, this substi- 
tution would in all instances reduce the value of det D which 
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is contrary to the desired results. As with Powell's first 
method, this process also calls for minimization along x n -x° 
starting from x° to obtain a new point from which to begin 
the next round of searches. This step, though, involves fur- 
ther problems that must be considered. 

Using the three points x° , x n , and 2 x n -x° let us use a 
quadratic interpolation to obtain the minimum along the line 
joining x n and x° . Since these three points are equally 
spaced the following function can be used for this purpose: 

g(t) = at 2 + bt + c for 0 < t < 00 



where 



and 



g(-l) = gl = f(x°) 
g(0) = s 2 = £(x n ) 



g C 1) = g 3 = f (2x n -x°) . 

Solving a system of three equations in three unknowns pro- 
duces the following results: 



a = (g 1 '-2g 2 + g 3 )/2 
b = (g 3 “2 1 ) / 2 

c = g 2 - 



The value of t, t s , for which g(t) has its minimum (maximum) 
value can be found by setting ^pjr(g(t)) equal to zero. Hence, 

t s = -b/2a = (gi-g 3 )/(2(gi-2g 2 +g 3 )) . 
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The value of g at this point is: 



g = at 2 + bt + c 
6 s s s 

= g 2 -(g 1 -g 3 ) 2 /(8(gi-2g 2 +g 3 )) . 

It must first be insured that g g is actually a minimum of 
g and not a maximum. This condition is satisfied if 

g^T (g(t)) = 2a = (g ! - 2g 2 + g 3 ) > 0. 

The point x g corresponding to t g is given as follows: 

x = x n +t (x n -x° ) . 
s s v 

Now consider the position of x g on the line joining x 11 and 

x° . Let yd = (x n -x°) and x = x n + A^d . Then by (2-5) 

P ^ P 

x n + APd p = x n ±d p V 2(f(x n )-f(x s )) 

= x° + d p i 2(f(x°)-f(x s )) . 

The minus sign is to be used if x g is between x° and x n . 

Hence 

(x n -x° ) + d (± "V 2 ( f (x n ) - f (x g ) ) - '2 (f (x°) -f (x g ) ) = 0 

dpA(x n -x° ) +dpAdp (± "V 2 (f (x n ) -"f(x ~) ) - "^2 (f (x° ) - f(x g ) ) =0 . 
But (x n -x°) = ydp , therefore, 

y =;V 2 (f(x n )-f(x s )) + V 2 (f(x°)-f(x s ). 
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Consider the case when x is between x° and x n . 

s 

y = + "V 2 (£ (x 11 ) -f (x g ) ) + "V 2(f (x°) -£ (x g ) ) 

= "^2 (f(x°) - £ (x 11 ) +£(x n ) - £ (x g ) ) * V7(f(x n )-f(x s )). 

Hence 

\ m \>\ 'V2(£( x 0 )-f(x n ) | > | X 1 1 

which implies that; 

| X 1 | / |p | < 1 for £ll i. 

But this last result violates one of the conditions that 
must be satisfied before the substitution of the direction 
(x n -x°) can be made. Therefore, when minimizing along the 
search direction (x n -x°) starting from x 11 , if the minimum 
occurs at a point between x n and x° the direction (x n -x°) 
is not used in the next search cycle. If A is defined to 
be the maximum decrease in the function over any of the 
search directions used, then the above conditions can be 
stated as follows: 

If either g 3 >gi and/or (g x - 2g 2 +g 3 ) (g i - g 2 - A) 2 > (g i - g 3 ) 2 
then the same search directions should be used again and x 11 
should be used as the new initial point. Otherwise x g should 
be used as the new initial point and (x n -x°) should be sub- 
stituted for that direction along which the function de- 
creased the most during the last search cycle. 

Unfortunately this new modification eliminates a use- 
ful property of the earlier method. The previous method 
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would, for a positive definite quadratic, generate n con- 
jugate directions after n search cycles and thus the mini- 
mum would be achieved on the next cycle. The modification 
lacks this quality because of the manner in which new search 
directions are generated. There is no guarantee that a 
newly generated direction that is used to replace another 
one might not itself be eliminated later in the process. 

Thus the property of convergence after n+1 searches will 
most likely be lost. This, though, may not be as serious 
a problem as it seems for most functions to be dealt with 
will not be quadratic anyway. 

A suggested criterion for convergence is to test whether 
the function has decreased in value significantly over a 
search cycle.. But, as has beer, stated previously, for cer- 
tain functions this could produce a premature halt to the 
procedure. Powell has suggested that the following cri- 
terion be used [13] . 

1. Continue the iterative process until the change in 
each variable over one cycle is less than one tenth the re- 
quired accuracy. Let the resulting point in the last cycle 
be a . 



2. Increase each variable by ten times the required 
accuracy and repeat step one, producing the point b. 

3. Minimize along the line joining a and b to obtain 
the point c. Stop the process if the components of (a-c) 
and (b-c) are all less than one tenth the required accu- 
racy . 
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4. Otherwise replace d 1 by (a-c) and start step one 
again . 

It appears as if this convergence criterion is a very 
strict one. Since this method requires a large number of 
functional evaluations, a convergence criterion which is 
too strict could cause the computer to do a great deal more 
work than is necessary. Of course, the nature of the prob- 
lem will dictate the amount of accuracy required, which 
will ultimately affect the proper choice of convergence 
criterion. But in all cases there must be some sacrifice 
in accuracy made to avoid too many evaluations of the func- 
tion . 

The following method v\ras designed in an attempt to 
alleviate another problem that arises in Powell's proce- 
dure. The requirements that must be satisfied before a 
new direction can be defined are much too demanding for 
problems involving a large number of variables. The re- 
sult is that frequently one set of directions is used over 
and over again which, as is readily apparent, is similar 
to the alternating variable method discussed previously. 

It is therefore desirable to reduce these requirements if 
the main characteristics of the method can be maintained. 

C. ZANGWILL'S METHOD 

Zangwill proposed the following revision of Powell's 
method in hopes of increasing what might be a slow rate 
of convergence in the former method [6], The procedure 
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involves the use of two different sets of search directions. 
The set c^, i = l,...,n, is the co-ordinate directions, 
normalized so that |c^| = 1. These directions will remain 
unchanged throughout the entire procedure. The other set 
d{, i = 1,.... ,n, where |d^| = 1., contains n linearly inde- 
pendent directions. These last vectors are used much in 
the same manner as the search directions employed in 
Powell's method. Thus it is this set that will change 
after each search cycle. The process is begun from the 
initial point x°. Then X° is calculated to minimize 
f(x° + X°d 1 ) and x 0 ,-, is defined as the point x°+X°d 1 . 

The value of t initially is set equal to one. The pro- 
cedure then becomes iterative. Thus for the first itera- 
tion t, the point x^ + ^, and the directions df, i = l,...,n, 

are all known. In general, for the iteration assume 

k- 1 k 

that t, the point x and the directions d^ , i = l,...,n, 

are given. The k t ^ 1 iteration proceeds as follows: 



k-1 



(i) Find a to minimize f( x n+ ^ + ac.) . Update t so 
that t is replaced by t+1, if i < t < n, and t is replaced 
by 1, if t = n. If a f 0 let x^ = x^,| + ac. . If a = 0 
repeat step (i) . If step (i) is repeated n times in suc- 
cession then no progress has been made when searching over 
the n co-ordinate directions. This will happen only when 
the minimum has been reached and thus the process should be 

halted. In this case the point at which the function is a 
.k-1 



minimum is x 



n+1' 
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k k. 

(ii) For i = calculate X^ to minimize £(x. ^ 

* xW) and set x* - x*^ * X*d*. Let 

I I (x^-x^~ j) I I . Calculate X^ , to minimize f(x^+X^ d^ ,,) 

‘ 1 v n H+-1*' 1 1 n**-l. n n+1 n+1' 

, 4_k k^,k,k c , jk+1 ,k r • i 

and set x , - x + X -d Set d- = d. , for i = 1, 

n+1 n n+1 n+1 l l + l 9 

• • • y XL » 

Now go to the k+1 iteration. 

It can readily be seen that step (ii) is very similar to 
Powell's method which was discussed earlier. 

Zangwill has proven the following important theorem 
concerning the above method. 

THEOREM: Let f be a quadratic function with a positive 

definite Hessian A. The above procedure stops at an opti- 
mal point in step (i) of iteration k where k < n. (Recall 



+-Vjo4- H)r>T,TOk 

uiia t x vj- v* v. 
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gence only if the linear independence of the search direc- 
tion is maintained.) 

PROOF: The proof will be by induction. Assume that at 

the beginning of the k^h iteration the method has generated 

k k 

k mutually conjugate directions, ^ > . . . >d n . The way in 

which this is done has been discussed earlier in this paper. 

Assume that the procedure does not stop during step (i) . 

Therefore, a new point has been generated, which implies 
k-1 k k k-1 

that x . , f x. . Since x = x . , + ac. where a f 0 and 
n+1 o o n+1 t 

k-1 k-1 

was chosen to minimize f(x + ac.) then f(x ) < f( x .-i)- 

v n+1 t J K <r n+1 

k 

Also since x n was generated by n minimizing searches begin- 
ning from x* then f(x^) > f(x^). Therefore, f(x^) > f( x ^) 

> f(x^ |) which implies that x^ f xj) ]■ . And thus (xj) ?■ 
v n + 1' r n n+1 n+1 
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k k k 

- x ) 4- 0 . During the k-1 iteration d 1i1} ...,d were 
n y 6 n-k+1’ ’ n 

k-1 1 

used as the last k search directions since d. . = d. . 

i + I i 

k — i 

Therefore the points x^ + ^ and were found by minimizing 

k k 

in the k dimensional space spanned by d n _ ^. + -^ , . . . ,d . There 

k k - 1 

fore, as was shown previously, d = x - x f 0 is con- 

r 1 ’ n+1 n+1 n 

k k 

jugate to the directions d , d . Therefore by the 

n-K+i n 

n th searc h cycle n conjugate search directions have been 
generated. 

It remains to be shown that n conjugate vectors are 
linearly independent. Assume that they are not independent 
which implies : 



d k ' = .E. a. d k ' 
1 i i 



V v V V 

d.'AdV = .E. a.dV'AdV = 0. 

3 3 i i 3 

k k 

But since A was assumed to be positive definite, d^.'Ad^.>0, 
which is a contradiction. Thus the assumption that the 
vectors are linearly dependent must be false. Since the 
above argument holds for k = 1 the induction proof is com- 
plete. 



Obviously since most functions of interest will not 
be quadratic away from the minimum this method will not 
generate true conjugate directions. For general functions 
it is not certain that this method is more efficient than 
the other methods presented earlier, when away from the 
minimum. However in the neighborhood of the minimum this 
latest method promises to be the most favorable thus far 
available, of those using only function evaluations. 
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IV. STEEPEST DESCENT AND NEWTON'S METHOD 



The method of Steepest Descent and Newton's method 
have been placed together in this chapter, though the 
theory and procedures of these methods are quite dissimi- 
lar. The Steepest Descent method uses only information 
about the first partial derivatives while Newton's method 
requires knowledge of the second partial derivatives. 

They have been combined in this chapter because they 
are the two classical approaches to the minimization prob- 
lem that remain useful today. They differ so much from 
the more recent methods that they deserve to be in a classi- 
fication of their own. Many of the more recent methods have 
been devised as improvements of these two. 

A. STEEPEST DESCENT BY CAUCHY 

This method was one of the first techniques’ suggested 
to solve the problem under consideration. The main prin- 
ciple involved here is the use of -Vf(x) as the search di- 
rection from x. This selection seems natural since a 
search in this direction from the point x insures that the 
function will at. least initially decrease most rapidly. 

When far from the minimum this direction seems to be the 
most useful for it offers the opportunity to approach the 
minimum in one step rather than having to use n search di- 
rections as in the methods discussed previously. Unfortu- 
nately, as the minimum is approached this direction tends 
to be less and less useful. 
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One reason for this is that round off errors and in- 
accuracy in determining the gradient can have a great ef- 
fect upon the search directions. This is demonstrated by 
the following diagram. 




From the above diagram it can be seen that the direction that 
should be used and the one that is actually used might 
be almost perpendicular. Another problem for some func- 
tions is that this method may generate directions that 
cause the search to oscillate about the principle axis of 
the function with very little progress made in each search. 
Problems such as these significantly reduce the effective- 
ness of this method. 

This method can be used in either of two ways, a 
fixed step size or by minimizing along the search direc- 
tion. The latter technique requires minimization along 
-Vf(x). The fixed step method sets the distance which is 
traveled along the search direction. The function is 
evaluated at this new point and this value is compared 
with the value of the function at the previous point. As 
long as the function is decreased the new point replaces 
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the previous one and the gradient is evaluated at this 
point to determine the next search direction. If one of 
these steps fails to reduce the value of the function 
then the step size is reduced and the process continued. 
Convergence is assumed to occur when the step size is 
reduced below a specified limit. 

Both of these methods have their relative advantages. 
The fixed step technique requires fewer function evalua- 
tions while the minimization process should converge more 
rapidly. Unfortunately, though, neither method will, in 
general, proceed very rapidly when close to the minimum. 

In 1957, Booth suggested that the point nine tenths the 
distance to the minimum along the search direction should 



be used instead of the actual minimum, 



TV. 

A ii 



0 pLITpOSS of tlllS 



is to attempt to reduce the oscillation about the principle 
axis which is typical of this classical method. This sim- 
ple procedure does reduce the problem but not enough to 
make the whole procedure useful for general functions [3] . 



B. NEWTON'S METHOD [3,9] 

Sometimes the Hessian matrix may be known for the 
function to be minimized. Since, as has been suggested, 
it may be useful to take full advantage of all the infor- 
mation obtained about the function, it may be wise to 
search for a method which incorporates this second partial 
derivative information. Newton's method is just such a 
technique . 
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By using a second order Taylor expansion for a quad- 
ratic function the following equation is produced: 



f(x) = f(x+h) = 



f(x) + .E. h. 

j = l 3 



8 2 f 



3 f 



3x . 

J 



n n 

^j = l kh 



h.h, 
1 k 



3x . 3x, 
3 k 



where x is the point at which f is a minimum. But, 
3 f 



3x . 

l 



3f 


n 

+ . Z 


h . 


•• m 

3 2 f 


< 

CP 

X 

P- 

1 


7 3=1 


3 


3x . 3x • 

L j 1 J 



i - 1 , . . . , n 



x 



At the minimum ,(3f/3x^) = 0 and therefore. 



Let 3f/3x. 

l 

implies 



M-= .S_ h. 

3x. 3=1 3 



3 2 f 



3x . 3x . 
3 l 



x 



g. and G . , = 3 2 f/3x.3x, , 

l 3 k 3 k 



i = l,...,n. (3-1) 

Then equation (3-1) 



g 



Gh or h 




g- 



Therefore 



x = x 



g 



since x 



x + h . 



The set of equations (3-1) must be solved to yield h. To 
do this the gradient at the current point must be known 
and the matrix of second partial derivatives must be avail- 
able and evaluated at the minimum, x. This last require- 
ment poses some problems because, in general, the actual 
minimum must be known before this can be done. But if this 
point is known then there is no problem. 
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Newton, used the preceding arguments to devise an itera- 
tive method to solve the minimization problem. When the 
search has reached the neighborhood of the minimum the ma- 
trix G is evaluated at the current point rather than the 
actual minimum. If in the neighborhood of the minimum the 
function approximates a quadratic, the matrix G tends to- 
ward a constant matrix and thus evaluating G at the current 
point should give a reasonable approximation to G when 
evaluated at the minimum. 

Newton's method has been shown to be a very useful and 
powerful minimization technique. But like all techniques 
it does have its limitations. For example, progress to- 
ward the minimum is assured only if G is positive definite 
and the method may actually diverge for general functions. 
Another problem is the time required to generate G and G 
and the storage space needed for these matrices. Since the 
matrix G is only used as an approximation, the time problem 
can be somewhat reduced. This can be done by calculating 
G and G 1 only after each k iterations rather than for each 
new step. Some of these problems are dealt with in the 
following method. 

C. MODIFIED NEWTON'S METHOD [6] 

The method now to be presented is a further refinement 
of the Newton's method that was just discussed. It was 
specifically designed to alleviate certain problems that 
the original Newton's method could not handle. One such 
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problem that will be dealt with is the occurrence of a 
Hessian matrix that might not be positive definite. 

This method requires the selection of X 1 to minimize 
f(x 1 +X 1 d^). As such this process is very similar to most 
of the methods discussed thus far; the key, though, is 
in the selection of d^ , the search direction. This is 
done according to the following rules: 

1. If , the current approximation to A 1 , has a 
negative eigenvalue then d^ should be chosen to satisfy 
the following: 

d.'H.d. < 0 and d.'Vf < 0. (3-2) 

2. If all the eigenvalues of are nonnegative then 
choose d. such that either, 

l ’ 

H.d. = 0 , d.'Vf < 0 (3-3) 

ii 5 l 

or 

H.d. = -Vf. (3-4) 

ii 

Consider the first situation, 

(i) 9f (x 1 + X' 1 d.)/dX. = d.'Vf < 0 at X 1 = 0 

(ii) 9 2 f(x 1 + X 1 d.)/3X? = d.'H.d. < 0 at X 1 = 0. 

K J v i" l ill 

Now (i) implies that f, at least initially, decreases in the 
search direction. If (ii) remains valid as X 1 ^ 00 then, ob- 
viously, the function is. ever decreasing and thus must ap- 
proach -°°. But if this is the case then the minimum has 
been found. Otherwise there must be some place along cV 
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where becomes positive definite or semidef inite . Once 

this area is reached then rule number two can be applied 

until such a time as another area is reached where H. has 

1 

a negative eigenvalue. 

Now consider the second situation. A similar argu- 
ment holds. In this case, d 2 (f (x^ + A^d . ) )/d \ 2 = d.' H . d . =d ! 0 = 0 , 
which also implies that unless an area of positive defi- 
niteness or semidefiniteness is reached along d^ the value 
of the function will again go to -°°. Equation (3-3) ob- 
viously defines just the search direction given in the 
section of Newton's method. 

In practice then this method employs the usual Newton 
search direction when in a region in which the Hessian is 



-v-\ /"N ^4 + T '17' A + o T -*-» U ry -v v* o ^ 4- v »1 /-\ 4- V< /“\ 

pUoxc-xVC Uul nil to . J_l! U CJlo i X mo Uiw 



n ^ 1 a r* + r* ry at.t 

04- ^ O- Vw O ilV^VV 



directions which should take the search process into an 
area where A is positive definite. 

In general, therefore, this modification should im- 
prove the behavior of Newton's method when away from the 
minimum. It should also be able to solve a more general 
class of problems than the classical Newton's method. 
Unfortunately, however, it does not completely remove all 
the problems arising in the use of Newton's method. The 
most significant disadvantage of these latest two methods 
is that both require a great deal of information concern- 
ing the function. For some functions it could be just as 
time consuming to compute second partial derivatives as it 
is to solve the problem by some other procedure. Also 
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difficulties arise in problems of large dimension. Invert- 
ing an n x n matrix requires a great deal of work if it 
can be done at all. For these reasons other methods have 
been developed to approximate H = A 1 
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V. VARIABLE-METRIC METHODS 



This is the last major classification of methods to 
be discussed in this paper. As such it represents some 
of the most recent developments in the field of function 
minimization. The theory behind these methods is rather 
simple in concept. It involves making better and better 
approximations to the matrix H = A 1 , where the function 
to be minimized is assumed to be approximated by a quad- 
ratic function f of the form: f(x) = x'Ax/2 + b'x + c. 

If f were actually quadratic then knowledge of A 1 would 
allow the minimum to be reached in one step, as was shown 
by equation (1-1) , Chapter I. Thus any method which can 
generate this matrix would indeed be valuable. 

A. DAVIDON, FLETCHER, AND POWELL [4,8] 

The original work in this area was presented by 
Davidon in 1959, but Fletcher and Powell took Davidon's 
original method and improved upon it to the extent that 
theirs has become one of the more popular and reliable 
methods available for minimization. Though most of the 
original work was done by Davidon, the notation and argu- 
ments by Fletcher and Powell are more concise and will be 
used in the discussion to follow. 

Let g^ = Vf(x^) and d^ = -H^g^ where H^ is the i^ 
approximation to A 1 . The matrix A is assumed to be posi- 
tive definite and H , the first estimate of H, is usually 

o’ 
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selected as the identity matrix. Note that this selection 
for H 0 produces an initial search direction that is simply 
that of the Steepest Descent method. 

Let us define the vectors 

p. = X^d. = x 1 + 1 - x 1 
r i 1 

and 



q i g i+l ‘ g i ’ 

The vector p^ is the step to the minimum along d^ from' the 
point x 1 , and is the corresponding change in the gradient. 

For this method it is desired to repeatedly update the 
matrix to make better and better approximations to A 1 . 
Consider a recursion formula which generates H^'s with the 
following properties. The set of vectors p 0 ,...,p^., are 
linearly independent and they are eigenvectors of H^^A 
with one as eigenvalues. Then, obviously, H^A will have n 
linearly independent eigenvectors with eigenvalue one. But 
this can occur only if H A = I and thus H = A 1 as desired. 
It will be established that the following recursion formu- 
la satisfies these requirements. 



p i p i 

H. . = H. + ■ 

i+l i p^q i 



-(H.q^CH^.)' 

q.'H.q. 

M i l l i 



( 4 - 1 ) 



First let us show that p^ is an eigenvector of H^ + ^A with 
eigenvalue one. To do this consider: 



q i = g i+l • g i 

= (Ax 1+1 + b) - (Ax 1 + b) 
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Hence 



Ax 



i + 1 



Ax 



1 







(4-2) 



Finally, if the following two results can be established 
then the desired properties of p^, .i = l,...,n, will be 
established. 



Equation (4-3) implies conjugacy which has been shown to 
require linear independence, and equation (4-4) is the de- 
sired result concerning eigenvalues and eigenvectors. Equa- 
tions (4-3) and (4-4) will be established by induction. 
Consider equation (4-4) with k = 1, 



p^’Apj =0 0 < i < j < k 



(4-3) 



H k Ap i = p i 0 - 1 < k ' 



(4-4) 



H x Ap 0 = p 0 



by (4-2). Consider (4-3) with k = 2, 



P o ' A P j = (P i ' Ap 0 ) ' = ( -^ 1 g j ’H jAp 0 ) 



or 



Po 'APx = 'p 0 = 0, 



(4-6) 
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since we minimize along p . 
Now assume that 



Pi'APj = P» 0 < i < j < k, 

and 

H i Ap i = p i} 0 < i < k. (4-7) 

Consider : 

g k = Axk + b 

= A(x 1+1 +p i+1 + . . • + P k _ 1 ) + b 

= g i+1 + A(P i+1 + ... + P k .i). 

Hence, we see that 

Pi' g k = Pi' g i + 1 + Pi ,A Pi + l + *** + Pi :A Pk-l 

= Pi '§1+1 » b y ( 4-6 ) » 

= 0 . 

But , 

0 ■ Pi’Sk - ( H k A Pi)'8k 

= Pi’AH k g k 

= Pi'At^k 5 

■ -P i 'A(p k /X k ) ; 

and hence 

Pi' A Pk = 0 < i < k. 
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But this is equivalent to the following: 



Pi'Apj =0 , 0 < i < j <k+l. 



(4-8) 



Also , 



H 



k .i A Pi * H k APi 



PkPk' A Pi ^kV^k’^Pi) 



Pi'^k 



q k’ H k q k 



Pi * 



PkPk'APi ( H k q k ) ( q k' H k A Pi) 



Pi’ q k 



q k' M k q k 



0 < i < k 



or by (4-8) , 



H k+l Ap i = p i > 0 5 1 < k * (4-9) 

By equations (4-5), (4-6), (4-8), and (4-9) the induction 
proof has been completed. Thus H = A 1 , as was desired. 

An obvious question at this point is what motivated 
the choice of the recursion formula? Consider the follow- 
ing : 

Let dj,...,d n be mutually conjugate directions. 

Then 



n 



(.Z, B-d.d. ' ) Ad = 8 d d 'Ad = d 
'■1 = 1 1 1 1 J s sss s s 



if e s = 



d 'Ad 



Therefore , 





d.d 
i 

d. 'Ad 

l 



n 



= • £-. 
1=1 



(X 1 ) 2 d.d. '/(X 1 ) 2 d. ’Ad. 
^ ' l l v ' l i 
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n 

= . E , p.p.'/p-'Ap. 

1 = 1 *i v i 

n 

= . E, p.p.'/P-'Q- 
i=l *1*1 *i n i 

Thus we see that the second term in the recursion formula 

- i 

was selected to make the approximation approach A 

It will now be shown that the third term on the right 
side of equation (4-1) is added as a correction factor. 

As was shown previously it was necessary that H^ + ^Ap^ = p^ 
to make this method valid. Consider the following: 



H 



i + 1 



H. 






+ 



C. . 

l 



It will now be determined what form must take in order 

to satisfy the condition that H. ,Ap. = p.. 

1 l+l ‘ i r i 



p. = H. ,Ap. 

*1 l+l *i 

“ ^ i AP i + P i P i 'Ap i /p i 'q i + C i Ap i 

= H i Ap i + p i + C i Ap i . 

Hence , 

H.Ap. = -C.Ap. 

l r i i *i 

or 

H.q. = -C.q. . 

i n i i n i 

A solution for to this equation is: 

C • = -H.q.z'/z'q- 

l i n i ' l i 

where z is an arbitrary vector that is not perpendicular to 



*i 
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For this choice of C. : 

x 



-H.q.z’q. 

C^q^ = — = -H-q^ , as desired. 

But since it is desired that be symmetric, z is set 
equal to H^q^. 

Therefore , 



C i = “ H i q i Cq A 'Hi) / CqjL ’ H i q i ) , which is as 

desired . 

Now consider the search direction d.. 

1 



-d.'g. = g.'H-g. 
1 & i 6 i i 6 i 



If it can be shown that g^'lLg^ is positive then it is 
evident that the search direction is always in the direc- 
tion of decreasing function values and thus the A 1 ' s can 
be chosen positive. But if can be shown to be positive 
definite then g.'H.g. > 0 as desired. Since H n is chosen 
to be positive definite it remains to be shown by induction 
argument that by (4-1) if 1L is positive definite then so 

is H. , . Consider 
l + l 



x'H. ,x = x'H.x + 
l + l l 



(x'PiHPj/x) (x'H i q i ) (q i 'H-x) 



p . ' q 



q. ' H . q • 

M i li 



(x'H i x)(q i ’H.q i ) - (x ' H^) ( qi ' H.x) 



q. ' H . q . 

n i i n i 



+ 



(x'p i ) 2 

w 
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But (x , H i x) Cq. ± ' H i q i ) > C x 'H^q^) (q^'H^x) , by Schwartz's in- 
equality. Therefore 

x r H i+1 x > (x'q i ) 2 /p i 'q i , 

with equality only if x and q^ are parallel. Obviously, 
(x'q^) 2 is greater than or equal to xero so it remains to 
be shown that p^'q^ > 0 for their quotient to be greater 
than zero. 



Pi' q i * Pi’C*i + l - «P 

= by minimization along d^, 

= -X 1 d. 'g. 

1 6 i 



= Ag. 'H.g. > 0, 

“l i & i 5 

since was assumed to be positive definite. 

Hence, x'H.^x > 0 for all nontrivial x; this implies H. , 
is positive definite, and thus the proof is complete. 

By proving that is positive definite for all i, 
it has been assured that for a quadratic function that this 
method is completely stable and will produce the minimum in 
at most n steps. The last part of this section concerning 
the positive definiteness was specifically due to Fletcher 
and Powell. 

The ease with which this method can be applied and its 
strong stability make it one of the most useful methods thus 
far discussed. As with all methods it will only be approxi- 
mate for functions which are more general than the quadratic 
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function discussed above; but it would be natural to as- 
sume that it still would usually out perform the other 
techniques . 

Of course, there are some problems present in this 
method also. As with most, Davidon's technique requires 
determining the minimum along a given direction. While 
this may or may not be a major problem it will require 
additional function evaluations which must be considered. 

Also there could be a storage problem, especially for 
large dimensional problems, since at each step an n x n 
matrix must be saved. It is the first of these problems 
that the next method attempts to avoid. It is not unusual 
for the necessary function and gradient evaluations to use 
half of the total computer time required for the solution 
of the problem. Thus, if the number of evaluations is 
reduced, without altering the basic method itself, it would 
be expected that the result would be a more efficient 
method . 

B. MURTAGH AND SARGENT 

This method, devised by Murtagh and Sargent [10], uses 
a recursion formula similar to that employed by Davidon, 
Fletcher, and Powell. The recursion formula to be is used in 
generating A 1 is 

H k+1 = H k + ( ^ p k" Hq k^ p k' Hq k' 1 ’^ q k’ ^ p k' Hq k^ * ( ' 4 ~ 10 - ) 
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As will be shown, the advantage of this method is that it 
does not require minimization along each search direction. 
This new formula can be developed as follows. 

Let us assume as before that the function is approxi- 
mated by a quadratic, f(x) = x’Ax/2 + b'x + c. Then, 



and 



or 



where 



and 



g(x k ) = Ax k + b 
g(x k _i) = Ax k . 1 * b 

s( x k- x k-i> * 

= Ap k 

g k " S(x k ) - S ( x k . 1 
p k “ x k ‘ x k- 1 ’ 



J 



as defined earlier. 

Now let H be an approximation to A 1 . If H were exact 
then Hq k = p k . But since H is not exact there is an error 
involved. Let e be this error, e = p k -Hq k , and consider 
now adding AH to H so that (H+AH)q k = p^_. Then, 



Hq k + AHq k = p k 

AHq k = p k _AHq k = e * 

If AH is chosen so that each column is a multiple of e then 
AHq k is also a multiple of e. Since it is desirable for H 
to remain symmetric the following choice for AH is made: 
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AH = mee ' 



in which m is a constant to be determined. We require 
mee'q k = e. Hence, 

m = l/e'q k = 1/ (P k -Hq k ) ' q k 

and 

AH = CP k -Hq k ) (p k -Hq k ) '/(p k -Hq k ) ’q k . 

This produces the following recursion formula for H. 

H k*i = H k * fPk‘ H< >k )( Pk- H ‘ik) ,/c ik , tPk- H< ’k ) - ( 4 - n > 

It should be recalled at this time that for a quadratic 
function the step to the minimum is given as follows: 

x - x k = -A” 1 g(x R ) . (4-12) 

A similar search direction will be employed in this method 
with the addition of an arbitrary scalar, a k , so that: 

P k = " a k-l H k-l g k‘-l ’ (4-13) 

The scalar is added to this formula because the step in 
(4-12) may at times provide a poor estimate of the distance 
to the minimum. 

The advantage of the recursion formula developed in 
(4-11) is that it is not required to minimize the function 
along each search direction. As was stated previously, 
this requirement for minimization was a disadvantage of 
Davidon's method. Unfortunately, though, this alteration 
does reduce the stability. Murtagh and Sargent prove the 
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following theorem concerning the convergence of their 
method [ 10 ] . 

THEOREM: Assume that the function f(x) is defined on UcE n 

and is such that 

i. f(x) is continuous on ft = {x|xeU; f(x) < c} and 
ft is closed and bounded. 

ii. f (x) has continuous second derivatives on = 
{x|xeU; f (x) < c} and there is a A such that | |H(x) j | < A, 
xeft' . 

Starting at any point x 0 eft' with g(x 0 ) f 0 , we generate a 
sequence x 0 ,Xj , . . . ,x^,x^ + p . . . from 



p k+l x k+l ' x k -a k H k g k* 



P||g k ll 5 l|H k g k M 5 a||g k || 
Uk’ H k g kl * 6 M g kM N H k g kM 



where p, a, and 6 are fixed positive constants, it is always 
possible to choose a finite nonzero a j, at each step such 
that : 



f (x k ) - f(x k+1 ) i ea k 8 k '» k g k > 0 

with e a fixed positive constant less than unity. With 
a^. so chosen, the sequence ( x k ) lies in ft' and tends to 
ft* = {x|xeft'; g(x) = 0} in the sense that the distance 
d(x^,ft*) of Xj. from ft* tends to zero as k-*>°. 
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Murtagh and Sargent's method satisfies the conditions 
of the above theorem if H^. is positive definite for all k. 

A theorem by Caratheodory (1967) can be used to establish 
conditions under which H^. will be positive definite. 

Let 

2 k ■ Pk ' H k-l q k and c k ' q k'V 

Define 

HCt) = H k _ 1 + tz k z k '/c k . (4-14) 

By Caratheodory 1 s theorem H(t) is positive definite in the 
range 0 < t < 1 if H k is positive definite and H(t) is 
nonsingular over this range. Since H k = H k + z p z p'/c k , 
to show that H k is positive definite by assuming that H k _^ 
is positive definite it is sufficient to show that H(t) is 
nonsingular for 0 < t < 1. From (4-14), 

det H(t) = det H ]c _ 1 (1 + tz k ' H k !i z i/ c k ) 

' det H k . 1 (1 + tz i c ' H k! 1 Cp k -H k .iq k )/c k ) 

= dot H kl (1 + tz k 'H k ; iPk /c k - tz k 'q k /c k ) 

= det H k _j_ (l-t+(tz k 1 H'* 1 (-a ]t . 1 H k . 1 g k . 1 ))) 

c k 

= det H k _ 1 (l-t-a k _ 1 t z k ’g k _ 1 /c k ) . 

It is necessary that 

(1 - t - > 0 0 5 t 5 1 (4-15) 
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for Hj, to be positive definite and nonsingular. Since 
a^_-^ is positive; it is necessary that 

Vf=k-i /c k < °- ( 4 - 16 > 



Since a positive definite matrix (usually I) can be chosen 
for H 0 , equation (4-16) is a necessary condition for H^, to 
be a positive definite matrix for all k. Consider the 
numerator and denominator of (4-16) separately. 



z k' g k-l = tPk^k-l 0 ^ ' g k-l 

^•" a k-l H k-l s k-l" H k-l q k^ ' g k-l 
= C" a k-l g k-l' H k-l~ (g k' ' g k-l' )H k-l )g k-l 



= M - a la 'H a -a *H a, f /l - 1 7 

~k- 1 J to k- 1 “k-l & k-l “k k-I°k-l ' ' 



c k = z k'»k 

= z k' ( 8k"Sk-d 

- ( ' a k-.i g k-i' H k-r g k-i' H k-i +g k-i' H k-i )- 

g k' z k' g k-l 

= ' a k-l g k-l ' H k-l g k-l g k H k-l g k +g k-l' H k-l g k 

- z k' g k-l 

(4-18) 

( 1-a k-l^ 8 k-l ' H k-l g k’ g k' H k-l g k~ z k g k-l 

Solving for a kl in equation (4-17) and substituting this 
into (4-18) produces the following result 
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g k-l' H k-l g k-l c k = z k ,g k-l cg k-l’ H k-l g k- g k-l' H k-l g k-l) 



'^ g k' H k-l g k g k-l’ H k-l g k-l" ( ^ g k-l H k-l g k ) 2) * 

(4-19) 

Since is positive definite, g^_ ^ 1 H^, _ ^ - 1 > an ^ t ^ ie 

sign of the quantity on the left side of equation (4-19) 
depends upon the sign of . Schwartz's inequality shows 
that : 



^'"k-iekSk-l'Hk-lSk-rfSk-l’Hk-lSlc) 2 ) i °- C 4 - 2 °) 



Now 



assume that z^'g^-l > 0 and examine (4-17), 



z k' g k-l ^ la k-l^ g k-l' H k-l g k-l ' g k' H k-l g k-l > 

If 0 < ct k l < 1 , 

^ l ot k-l^ s k-l' H k-l g k-l > g k' H k-l g k-l 
and hence 

8k-l' H k-l8k-l > ^'“k-iSk-r 

If a k - i = 1. 



•8k' H k-l8k-l =■ 0 



whence 



g k' H k-l g k-l < °* 

But Hj, is positive definite and therefore g^_ ^ ' H^_ ^g^_ ^>0 . 
Therefore , 
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g k-l' H k-l g k-l 11 s k' H k-l g k-l* 

If °k-l > 

C i ~ _ ^ ) ^ o . 

Hence , 

C 1 - a k-l>*k-l' H k-i*k-i > g k' H k-i g k-i 

which implies that, 

g k' H k-l g k-l < °* 

Again , 

g k-l' H k-l g k-l > 0 
which implies, 

g k-l' H k-l g k-l > g k' H k-l g k-l* 

Therefore z k 'g k > 0 implies that g k -! ' H k-l g k-l > g k " ri k-l g k- 1 ’ 
which by (4-15) implies, 



z k' g k-l tg k-l ' H k-l g k' g k-l ' H k-l g k-l^ < 0 



(4.21) 



By (4-20) and (4-21) then c^ < 0. Therefore if z k'§k-l > 0> 
then z ] c 'g k _i/ c k < 0 which is as required by (4-16). Sim- 
ilarly Cj, > 0 implies that z k 'g k _i < 0, which once again 
satisfies condition (4-16). Unfortunately, though, 

Z k' g k-1 - ® and c k < 0 can occur simultaneously. Then con- 
dition (4-16) is violated and will not be positive defi- 
nite. Since it is necessary that this property is maintained 
this can cause some serious problems. Consider the results 
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= 1. Equations (4-17) and (4-18) 



of taking a step with 
imply : 

z k' 8 k-l ‘ -*k' H k-lSk-l 

c k = " 8 k' M k-l 8 k ‘ z k' 8 k-l' 

If z^'g^.-^ is positive then, obviously, is negative and 
the positive definite requirement is satisfied. Thus it 
might be wise first to take this step with = 1 and 

to test z j c 'g^_^ > 0. If this is the case then there is 
no problem and should be updated by the given recursion 
formula. If this step is taken but Zj-'g^.^ 5 0 then gen- 
erally the function has decreased. This in itself is 
desirable since the process has reached a "better" point. 

If z^'gj, < 0 then it should be tested for c^. > 0. If 
this holds then condition (4-16) again has been satisfied 
and thus the recursion formula should be applied. Here, 
though, it is wise to add a test to ensure that c^. is not 
too close to zero, which would contribute additional prob- 
lems. If this becomes the case or Zj, ' gj. _ i/c^ > 0 then it 
is necessary to start again with a new H 0 from the latest 
best point. Murtagh and Sargent suggest two possible 
choices for this new H 0 , either I or the previous H . . The 
first choice, of course, is simply starting over again with 
no information about H. This could be useful if has be- 
gun to accumulate misinformation concerning the function. 
But, in general, it would probably be best not to destroy 
all the previous information. 
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Murtagh and Sargent offer a number of algorithms em- 
ploying the methods they devised [10]. The one that they 
found to be the most useful involved the checks discussed 
thus far and, in addition, certain checks to ensure that 
the conditions of their theorem were met. By the crite- 
rion of fewest function evaluations required this last 
method was generally found to be the most efficient. 

Again this is as has been anticipated because the need to 
minimize along the search directions was reduced. It was 
found that the conditions of the theorem of Murtagh and 
Sargent was far less restrictive then requiring actual 
minimization. 

If function evaluations was the only criterion then 
it could be said that generally Murtagh and Sargent's 
method was superior to Davidon, Powell, and Fletcher's. 

But because of all the tests that must be made the method 
must surely be more difficult to program and, outside of 
function evaluations, more time consuming to run. In ad- 
dition, since at times the conditions for positive defin- 
iteness can fail and a new H fl must be selected, the 
convergence of the method will obviously be slowed down. 
Should this resetting of H be required too often there is 
no doubt that all advantages this method might have would 
be lost. 

C. PEARSON'S CLASS OF VARIABLE METRIC METHODS 

In this section will be presented a class of related 
methods devised by Pearson [11] . Included in this class is 
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the method invented by Davidon. Though the recursion for- 
mulas of these methods are different, their development is 
closely related as will be shown in the following. 

In the previous section it was shown that for a quad- 
ratic function q k = Ap k where q k = g k -g k _ 1 and p k = x k -x k _ 1 
Therefore, Hq k = p k , where H = A 1 . Consider the following 
possibility. Assume H^.q^ = p^, for i = l,...,j, where 



- 1 



is the j th approximation to A ‘ . If HL can be updated so 

that H n q^ = p^, for i = l,...,n, and if the set {p^; i = 1, 

. . . ,n} is linearly independent then H n = A 1 . This can be 

seen from the following: 

Assume , 

H q • - p • , 

n n i *i ’ 

for i = 1 , . . . ,n. But , 



q t - A Pi> 

for i = l,...,n therefore, 



C H n A)Pi = Pj, 
for i = l,...,n and hence 
(H n A-i) Pi = o: 

But the set {p i >, i = l,...,n, is linearly independent, 
which implies that = A 1 . Now define the search direc- 
tions as before, d^ = H^’g^. 

Then , 

q 'd. = q 'H. 'g. , for s = 1 , . . . ,i-l 
M s l l s 1 6 i’ 
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( 4 - 21 ) 



Thus, if P s 'g^ = 0, then 

q ' d . = 0 
n s 1 

for s = l,...,i-l. But consider what occurs when the function 
is minimized along each search direction. 

H n q i ' Pi 

Pj' AH n q i = Pj' A Pi> 

for j = l,...,i-l. But, 

Pj' A Vi = Pj ,q i 

- Pj'Cgi-gi-i) 

= Pj 'gi'Pj 'gi-l» £or j = 

If j = i-1, then by minimization along d._^, 

p i-i ,g i-i = 0 
and thus 

Pi-i'CSi-Si-P ' Pi-i’Si- 

If j < i-1 then, 

Pj'CSi'Si-P = Pj’gi-Pj'gi-r 
In either case, 

p s ' g i = 0 £or S = 1 » • * • » i_1 
implies 

P-j'APj^ = Pj'gi ' P j ' g^- ! = 0 for j = 1 , . . . ,i-l , 

and therefore each new direction generated is conjugate to 
the previous ones. Thus if the condition (4-22) is satisfied 
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the method generates conjugate directions. The problem 
then is to find solutions to Hj_q^ = p^, for i = l,...,k-l. 
To do this define the following matrices: 



Qi (<-1 1 • • 

Pi = (pj . . .Pi^) . 

In this notation the problem then is to find solutions to 

H . Q . = P . , r a ^ -z\ 

i x i i’ (4-23) 

for i = l,...,n. Consider, 

Hi = P i (Q i 'MQi)' 1 Qi'M + H 0 (I-Qi(Qi'M*Q i )' 1 Qi'M*) (4-24) 

H i Q i = P i (Q i , MQ i )' 1 (Q i , MQ i ) + H 0 IQi-H 0 Qi (Qi ’M*Qi) 



(Q.'M*Q.) 

= P. 

1 

where M and M* are arbitrary. Thus (4-24) defines a solu- 
tion to (4-23). It was given that M and M* were arbitrary 
but this is not completely true since Q ^ ' MQ ^ f 0 and 
Qi'M*Qi f 0. Obviously, if M and M* are chosen to be posi- 
tive definite then Qi'MQi and are unequal to zero. 

There are two matrices that seem to be likely choices for 
M and M* . First there is H which, of course, can and will 
be selected to be positive definite; and then there is A 
which is assumed to be positive definite. 

Since nothing in (4-24) specifies otherwise, M and M * 
can be chosen independently. By doing so, four different 
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forms of (4-24) can be produced. Pearson makes use of a 
lemma called the Bordered Inverse Lemma to produce the 
following recursion formulas from equation (4-24). 



H- . , = H. + (p. -H.q.) (p. ')/p. 'q. 

l + l l i M i*' '•^l •" *i H i 

H . , = H. + (p. -H.q.) (H.q.) 7q- 'H.q. 

l + l l K 1 H i l^i 



(4-25) 

(4-26) 



H i+1 - ^ + p i 'p i /p i 'q i - (H i q i )(H i q.)'/q 1 'H i q i . (4-27) 

Notice that (4-27) is exactly (4-1) which was Fletcher, Powell 
and Davidon's recursion formula. 

It can readily be seen that equations (4-25) and (4-26) 
both produce H^ ' s that will not be symmetric. This is, of 
.course, a slight disadvantage since it will require addition- 



al storage 



O p c 



CGIup cl TG d 



t f ■» ^ T\ m r 1 d V4 t /-» w\ r, h n v> h -» h 

>v x oil uav xuuji me ciiuu Vv il j. cii 



produces symmetric matrices. . This, though, should only 
be significant in problems involving a large number of 
variables. In general, the results so far indicate that the 
three recursion formulas given above produce similar results. 
For some functions it may be necessary to replace the cur- 
rent approximation of A 1 with the positive definite matrix 
that was originally chosen. This happens if the approxima- 
tion becomes singular as the minimum is approached. Gener- 
ally, though, this is not required. 
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VI. CONCLUSIONS 



In the writing of this paper the author has studied 
the literature. Of course, it has by no means covered 
every possible method by which the unconstrained minimiza- 
tion problem may be solved. But an attempt has been made 
to offer as wide a coverage as possible of the different 
techniques which are available. The methods included are 
those which have been found to be the most reliable in 
solving actual problems. Research with computers on 
specific problems have shown that no one method is guaran- 
teed to out perform all others on every problem. In gen- 
eral then, the greatest difficulty might be the actual 
selection of the method to be employed. 

To make this decision it is important to consider all 
the information about the function which is available. 

This includes such things as having second partial deriva- 
tives which may be computed, or the knowledge of only the 
gradients, or only having access to the function values. 
Generally it has been found that gradient methods are usu- 
ally the most reliable, but this is in reference to methods 
applied to functions for which the gradient is available 
from analytic expressions. On occasion, though, for some 
functions the gradient is only calcuable through numerical 
methods. When this is the case the accuracy of the entire 
method is greatly reduced. In fact, in these cases it is 
best, as a rule, to use one of the methods employing only 
function values. 
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For most functions there may be a number of methods 
that can be used to obtain the minimum. Thus if all that 
is required is the proper minimum then the problem of 
selection may be greatly reduced. But in practice there 
are, of course, other important factors which must be 
considered . 

Computer time for the solution is one of these vital 
factors. Consider, for example, the Steepest Descent 
method. For certain functions this method may have an 
extremely slow rate of convergence. But if this technique 
leads to the correct minimum then it must be included among 
the methods from which the one method to be used is selected 
But to select this method in such a case would be a serious 
mistake. There may be another method which could solve the 
prbblem in one tenth the time. With all other factors 
equal this other method would obviously be the better choice 

In general, though, the relative advantages of each 
method are unknown for any given function. It would be 
best to study the function to be minimized before any se- 
lection of a technique is made. If any special character- 
istic of the function can be identified then it may be 
possible to make a wiser selection of the method to be used. 

The author has found the book by Box [3] and the book 
by Kowalik and Osborne [9] to be particularly helpful. 
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APPENDIX A: LINEAR SEARCH TECHNIQUES 



In this appendix \\rill be presented a number of methods 
for finding the minimum of a function along a given line. 
Since many of the minimization techniques discussed in this 
paper require one of these methods their importance cannot 
be minimized. 

1. FIBONACCI SEARCH [3,9] 



Assume that we decide to make N function evaluations 
within the interval (x x - ,x 2 ) where a minimum is known to 
exist. Assume the original neighborhood is designated 
(x*,x 3 ). Then define the following points: 

x i = Pm-^-xJ) + xj 
' 3 ~ 

^N 



X. = 



F N-1 C x 2 ' x !) + x i 



N 



where F is a Fibonacci number, defined by the following 
n 

relations : 



F 



o 



F 



n 




n- 2 



n > 2 . 



If f (xi) > f (x, 1 ) then the minimum must lie between xi and 
x 2 . Otherwise the minimum must lie between xj and xj. 
Consider (x*-x|) and (x^-xj): 



x * - x * 
A 1 



F N-i ( 3 C k x ! ) 

f n 
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(A-l) 



;-x! = x) - f n- 2 c x h x i^ - x ! 



N 



- a-fWV^ • (i- f n . 2 /f n )x; 

^ F N’ F N-2^ X 2 " ^ F N” F N-2^ X 1 



N 



N 



Thus , 



x'-x 1 



= F N-l Cx h x !> 



CA-2) 



N 



Thus (A-l) and (A-2) show that, regardless of which interval 
the minimum has been restricted to, the length of the inter- 
val is (F^_^/F^) times the length of the original interval. 
The end points of the interval containing the minimum are 
the relabeled, x* and x 2 . The process is then repeated 
using the following general formulas. 



;i = F N-l-i (X 2- X i) + 



N+l-i 

x 4 = (x 2 -x^) + Xj 

F N+1- i 

for i = 1,...,N-1. The final two points would, by the above 
formulas, coincide and thus should be offset by some small 
e. The length of the final iterval to which the minimum is 
restricted is : 



( x >- x ;)/F n + e 
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Thus the accuracy to which the minimum is found depends 
upon the size of the original interval and the number of 
function evaluations to be made. By the nature of Fibonacci 
numbers it is only necessary to make one function evaluation 
per iteration after the initial iteration. 

2. DAVIES, SWANN AND CAMPEY'S SEARCH TECHNIQUE [3,9] 

In this method the function is approximated along the 
line by a quadratic. If the three points used to locate 
the minimum are separated by an interval greater than some 
specified size, the operation is repeated with a smaller 
interval . 

For this method an initial step size is decided upon 
depending upon the estimated distance to the minimum from 
the current point. This step ^ize should be about one 
fourth this estimated distance. The initial step is taken 
toward the minimum and the function is evaluated at this 
new point. If the function has increased then cut the 
step size and begin again from the initial point. This is 
done until a point is found for which the function has de- 
creased. The step size is then doubled and a new step is 
taken from the latest point. This process is continued until 
a function increase is located. At this time the current 
step size is cut in half and a step is taken from the last 
point at which the function decreased. The last four 
points found are thus equally spaced say, s units apart, 
and define an interval in which the minimum must lie. The 
end point of this interval furtherest from the point at 
which the function has the smallest value is discarded and 
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the remaining three points are used to approximate the 
minimum. Let x x , x 2 , and x 3 be these three points with 
£ x , f 2 , and f 3 be the respective function values at these 
points. Let us assume a quadratic approximation for f on 
the line, 

f(t) = at 2 + bt + c. 



in which the values - 1 , 0 , 1 for t correspond to values at 
Xj , x 2 , and x 3 respectively. 

Thus , 

a = (f x -2f 2 +f 3 )/2 

b = (£,-£, )/2 

c = £ ? . • 

T 1 V» c\ c \ r\ -y* 



(f i - 2f 2 -f 3 ) t 2 (fj-fj) t 
f(t) = 9 + 9 + f S 



The minimum of f(t) is at t = (f 3 -f 3 ) /2 ( f 1 - 2f 2 + f 3 ) . 

The point x s corresponding t g is x g = x 2 + t (x 2 -Xj). The 
function is then evaluated at this new point and the pro- 
cess is begun again with a reduced step size from the point 
at which the function had the smallest value. The process 
is continued until the change in successive approximations 
to the minimum is less than half the desired accuracy. 

3. POWELL'S ALGORITHM [3,9] 

This method differs from the previous method only in 
the manner of selecting the three points from which to 
interpolate the minimum. Assuming x^ is the initial point 
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then x 2 = Xj + S is selected where S is a fixed displace- 
ment. The third point, x 3 , is chosen as follows: 



Using these three points another quadratic interpolation 
is performed as outlined above. If the new point differs 
from the point where the function had its smallest value 
by less than the required accuracy then this point is 
assumed to be the desired minimum. Otherwise the point 
at which the function is the largest is discarded and a 
new interpolation is made using the remaining three points. 
4. DAVIDON'S CUBIC INTERPOLATION 

Davidon uses a cubic interpolation based on values for 
the function and its gradient at two points of the line [4, 



Assume that the value of the function and its gradient 
are known at two points, x and y, where y = x + ad. This 
method calls for using a -cubic interpolation as follows. 

Le t , 



x 3 = x r + 2S if f x > f 2 



x 



3 



x, -Siff, <f. 
1 1 2 



13] . 



f(t) = at 3 + bt 2 + ct + d 0 < t < a 



where 



f(x) = f (0) = d 

f(y) = f(a) = aa 3 + ba 2 + ca + d 




c 



Vf (y) 's = g sy 




a 



3aa 2 + 2ba + c. 
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Therefore , 



a “ ( 8 S y“ + 8 sx o ' 2(f y -£ x ))/a ! 

b - (3(f y -f x )-og sy - 2g sx a) /a 2 



d ' f x' 

Thus , 

II 'S = g st - 3at 2 * 2b t + c 

= 3((g sy a * g sx o - 2(f y -f x )t 2 /a ! 

+ 2t(3(f y -f x ) -ng sy - 2g s a)/c< 2 * g sx 
= g. x -2t(g +B)/a + t 2 (g_ + g_+2B)/a 2 

O A. b -rv 0^0/ 

where B = 3(f x -f y )/a * g sx * g sy . 

From the quadratic formula it is found that the desired zero 
is at 






+ |(8sx +B ) + ' 



Vs 



(g SX 2+B2 + 2 g e^ B ) — (gev 2 + S CY g C vr +2B S 



‘sx 



a 2 



sx 



, sx°sy 



? sx 



-4(g + g + 2B) 

a 2 ^ 6 sx 6 sy ' 



a( g sx +B+ Q) 

Tg +g + 2B) 
V6 sx 6 sy 

where 



-V 



*g g 

& sx & sy 
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Usx+Ss/ZB) - (8 s /B-Q) 

«sx + g S y* 2B 



= a 



(g +B-0 

1--JL 5X_ x 



Tg 



+ a 



sx °sy 



2B) 



And thus , 



t = a 
m 



■ (g S y^-^ 

^sy 



(A- 3) 



Since the condition for a minimum along S is that the com- 
ponent of the gradient along S is zero, the above equation 
(A-3) gives an estimate' for this minimum. Thus the estimate 
for k such that. f(x + kS) is a minimum is: 



k = a 

For the solution 
to ensure that a 



(g s /Q-B) 

Tg^g^Q) 

to this problem then 
reasonable choice is 



it is only necessary 
made for a. 
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